Model-Based Relational RL When Object Existence is Partially Observable
نویسندگان
چکیده
We consider learning and planning in relational MDPs when object existence is uncertain and new objects may appear or disappear depending on previous actions or properties of other objects. Optimal policies actively need to discover objects to achieve a goal; planning in such domains in general amounts to a POMDP problem, where the belief is about the existence and properties of potential not-yet-discovered objects. We propose a computationally efficient extension of model-based relational RL methods that approximates these beliefs using discrete uncertainty predicates. In this formulation the belief update is learned using probabilistic rules and planning in the approximated belief space can be achieved using an extension of existing planners. We prove that the learned belief update rules encode an approximation of the exact belief updates of a POMDP formulation and demonstrate experimentally that the proposed approach successfully learns a set of relational rules appropriate to solve such problems.
منابع مشابه
The Dynamics of Multi-Agent Reinforcement Learning
Infinite-horizon multi-agent control processes with nondeterminism and partial state knowledge have particularly interesting properties with respect to adaptive control, such as the non-existence of Nash Equilibria (NE) or non-strict NE which are nonetheless points of convergence. The identification of reinforcement learning (RL) algorithms that are robust, accurate and efficient when applied t...
متن کاملA reinforcement learning scheme for a multi-agent card game with Monte Carlo state estimation
This article presents the state estimation method based on Monte Carlo sampling in a partially observable situation. We formulate an automatic strategy acquisition problem for the multi-agent card game “Hearts” as a reinforcement learning (RL) problem. Since there are often a lot of unobservable cards in this game, RL is dealt with in the framework of a partially observable Markov decision proc...
متن کاملA reinforcement learning scheme for a multi-agent card game
We formulate an automatic strategy acquisition problem for the multi-agent card game “Hearts” as a reinforcement learning (RL) problem. Since there are often a lot of unobservable cards in this game, RL is approximately dealt with in the framework of a partially observable Markov decision process (POMDP). This article presents a POMDP-RL method based on estimation of unobservable state variable...
متن کاملA PAC RL Algorithm for Episodic POMDPs
Many interesting real world domains involve reinforcement learning (RL) in partially observable environments. Efficient learning in such domains is important, but existing sample complexity bounds for partially observable RL are at least exponential in the episode length. We give, to our knowledge, the first partially observable RL algorithm with a polynomial bound on the number of episodes on ...
متن کاملMarket-Based Reinforcement Learning in Partially Observable Worlds
Unlike traditional reinforcement learning (RL), market-based RL is in principle applicable to worlds described by partially observable Markov Decision Processes (POMDPs), where an agent needs to learn short-term memories of relevant previous events in order to execute optimal actions. Most previous work, however, has focused on reactive settings (MDPs) instead of POMDPs. Here we reimplement a r...
متن کامل